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A method for database administration and replication, comprising the steps of: 

Proving a database management system with an integrated (built in) random sampling facility; 
Selecting a default sample size value S; 

Selectively receiving a desired sample size value D and setting said default sample size value S to 
said desired sample size value D when said desired sample size value D is received; 

Randomly sampling S records of the database using said random sampling facility; 

Wherein the step of sampling said S records includes randomly sampling the S records 
utiliz ing dataspaces i ncluding : 

At least one index dataspace; 
At least one key dataspace; and, 
At least one statistics dataspace; 

Stpring*statistics for each of said S records, wherein said statistics include a record key for each 
record; and 

Producing at least one of: 

An extrapolated replication partition analysis based on said statistics; and 
A partial replication partition analysis based on said statistics. 
Wherein the step of producing at least one of said partition analysis includes the step of defining 
multiple partition boundaries; 
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Fi le 4 11: DIALINDEX (R) 



AL INDEX <R) 
(c) 2004 The Dialog Corporation pic 

DIALINDEX search results display in an abbreviated 
vi * format unless you enter the SET DETAIL ON command. 
?set file sail 

>>>"SALL" is not a valid Dialindex category 
>>>No valid files specified 
?set files all 

You have 556 files in your file list. 

(To see banners, use SHOW FILES command) 
?s (random?? (3n) sampl???) (lOOn) (dataspace? ? or data()space? ?) 

Your SELECT statement is: 

s (random?? (3n) sampl???) (lOOn) (dataspace? ? or data {) space? ?) 

Items File 

3 2: INSPEC_1969-2004/Jan Wl 

1 7: Social SciSearch (R) _1972-2004/ Jan W2 

1 8: Ei Compendex ( R) _1 970-2004 /Jan Wl 

3 34: SciSearch(R) Cited Ref Sci_1990-2004 / Jan W2 

Examined 50 files 

1 94: JICST-EPlus_1985-2004/Jan Wl 
1 144: Pascal_1973-2004/Jan Wl 

Examined 100 files 

1 148: Gale Group Trade & Industry DB_1 97 6-2004 /Jan 15 
1 155: MEDLINE (R) _1966-2004 / Jan W2 

1 180: Federal Register_l 98 5-2004 /Jan 14 
Examined 150 files 

Examined 200 files 

2 340: CLAIMS (R) /US Pa tent_l 950-03/ Jan 13 
1 348: EUROPEAN PATENT S_l 97 8-2004 /Jan W02 

3 349: PCT FULLTEXT_197 9-2002/UB=20031225 , UT=2003 12 18 
Examined 250 files 

3 440: Current Contents Search (R)_l 990-2004/ Jan 15 

Examined 300 files 

Examined 350 files 

Examined 400 files 
Processing 
Processing 

6 654: US Pat . Full . _1976-2004 /Jan 13 
Examined 450 files 
Examined 500 files 
Examined 550 files 



14 files have one or more items; file list includes 556 files. 



File 2:INSPEC 1 969-2004 /Jan Wl 

(c) 2004 Institution of Electrical Engineers 
File 7:Social SciSearch(R) 1 972-2004 /Jan W2 

(c) 2004 Inst for Sci Info 
File 8:£i Compendex ( R) 1 970-2004 /Jan Wl 

(c) 2004 Elsevier Eng. Info. Inc. 
File 34 : SciSearch ( R) Cited Ref Sci 1 990-2004 /Jan W2 

(c) 2004 Inst for Sci Info 
File 94 : JICST-EPlus 1 985-2004 /Jan Wl 

(c)2004 Japan Science and Tech Corp(JST) 
File 144:Pascal 1 973-2004 /Jan Wl 

(c) 2004 INIST/CNRS 
File 148:Gale Group Trade & Industry DB 1976-2004 /Jan 15 

{c)2004 The Gale Group 
File 155 : MEDLINE ( R) 1 966-2004 /Jan W2 

(c) format only 2004 The Dialog Corp. 
File 180:Federal Register 1 98 5-2004 /Jan 14 

(c) 2004 format only The DIALOG Corp 
File 34 0: CLAIMS ( R) /US Patent 1950-03/Jan 13 

(c) 2004 I FI /CLAIMS ( R) 
File 348:£UROPEAN PATENTS 1 978 -2004 /Jan W02 

(c) 2004 European Patent Office 
File 349:PCT FULLTEXT 1 97 9-2002 /UB=20031225 , UT=200312 18 

(c) 2003 WIPO/Univentio 
File 440:Current Contents Search(R) 1990-2004 /Jan 15 

(c) 2004 Inst for Sci Info 
File 654:US Pat. Full. 197 6-2004 /Jan 13 

(c) Format only 2004 The Dialog Corp. 

Set Items Description 

51 28 ( RANDOM?? { 3N) SAMPL??? ) (100N) (DATASPACE? ? OR DATA ( ) SPACE? - 

_?> 

52 18 lj RD (unique items) 



2/5/3 (Item 3 from file: 2) 

DIALOG (R) File 2:INSPEC 

(c) 2004 Institution of Electrical Engineers. All rts . reserv. 

5034359 INSPEC Abstract Number: B9510- 61 4 0C-4 69, C9510-1250-239 
Title: Registration of 3-D images by genetic optimization 

Author(s): Jacq, J. -J.; Roux, C. 

Author Affiliation: Dept. Image et Traitment de l'Inf., Telecom Bretagne, 
Brest, France 

Journal: Pattern Recognition Letters vol.16, no . 8 p. 823-41 
Publication Date: Aug. 1995 Country of Publication: Netherlands 
CODEN: PRLEDG ISSN: 0167-8655 

U.S. Copyright Clearance Center Code: 01 67-8 655/95/$09 . 50 
Language: English Document Type: Journal Paper (JP) 
'? reaLnienc : Applications (A); Theoretical (T) 

Abstract: We present a framework for solving the 3D registration problem 
i.n medical imaging based on a canonical genetic algorithm (CGA) . The issue 
oC 3D registration is stated as an optimization problem in both application 
cases presented,' i.e., volume-to-volume and surface-to-volume registration. 
The CGA uses a stochastic fitness function which operates on randomly 
selected samples of the data space . At a higher level, an adaptive 
search space scaling technique is presented which operates by successive 
activations of the CGA procedure. The former features ensure a lower 
complexity of the search algorithm and a good accuracy of the final 
solution. Volume-to-volume and surface-to-volume registration are then 
considered. The features that are specific to the application (the actual 
optimization space, the fitness or distance function, the GA parameters) 
are introduced. Results concerning two registration problems using 3D 
computerized tomography data are presented and discussed. (17 Refs) 

Subfile: B C 

Descriptors: biomedical imaging; computerised tomography; genetic 
algorithms; image reconstruction; image registration; medical image 
processing; search problems; stereo image processing; tomography 

Identifiers: 3D image registration; medical imaging; canonical genetic 
algorithm; 3D computerized tomograph; optimization; volume-to-volume 
registration; surface-to-volume registration; stochastic fitness function; 
data space; adaptive search space scaling; search algorithm; shape 
reconstruction 

Class Codes: B6140C (Optical information, image and video signal 
processing); B7510 (Biomedical measurement and imaging); B0260 ( 
Optimisation techniques); C1250 (Pattern recognition); C7330 (Biology and 
medical computing) ; C5260B (Computer vision and image processing techniques 
); C1180 (Optimisation techniques) 

Copyright 1995, IEE 



2/5/4 (Item 1 from file: 94) 

DIALOG (R) File 94 : JICST-EPlus 

(c)2004 Japan Science and Tech Corp(JST). All rts. reserv. 

04727330 JICST ACCESSION NUMBER: 01A0216796 FILE SEGMENT: PreJICST-E 
Data Sampling for Evaluation of Structural Diversity of Chemical Compounds. 

TAKEZAWA HIROSHI (1); TAKAHASHI YOSHIMASA (1) 
(1) Toyohashi Univ. of Technol . 

Joho Kagaku Toronkai, Kozo Kassei Sokan Shinpojiumu Koen Yoshishu, 2000, 

VOL.23rd-28th, PAGE. 208-211 
JOURNAL NUMBER: X0081AAK 

LANGUAGE: Japanese COUNTRY OF PUBLICATION: Japan 

DOCUMENT TYPE: Conference Proceeding 
MEDIA TYPE: Printed Publication 

ABSTRACT: This paper describes a data sampling method for the evaluation of 
structural diversity of chemical compounds. Three different types of 
methods ( random sampling , cell partitioning method and 
clustering-based method) were investigated using a trial set of 5000 
points prepared by two-dimensional random numbers. For cell 
partitioning method, two different approaches were tested: the sample 
distribution density was taken account for one, and not for the other. 
The results showed that the cell partitioning method with taking the 
density gives the most diverse sampling on that space. The method was 
applied to diverse sampling of chemical structures on a higher 
dimensional structural feature space characterized by topological 
fragment spectra. For this case, data sampling was carried out on a 
reduced data space that is produced by mathematical mapping. The 
result also validated the usability of the cell partitioning approach 
combined with the space reduction, (author abst . ) 



2/3, K/l (Item 1 from file: 2) 

DIALOG (R> File 2:INSPEC 

(c) 2004 Institution of Electrical Engineers. All rts. reserv. 

6908985 INSPEC Abstract Number: C2001-06-6130-001 

Title: Nonlinear mapping of massive data sets by fuzzy clustering and 
neural networks 

Author (s): Rassokhin, D.N.; Lobanov, V.S.; Agrafiotis, D.K. 
Author Affiliation: 3-Dimensional Pharm. Inc., Exton, PA, USA 
Journal: Journal of Computational Chemistry vol.22, no . 4 p. 373-86 
Publisher: Wiley, 

Publication Date: March 2001 Country of Publication: USA 

CODEN: JCCHDD ISSN: 0192-8651 

STCI: 0192-8 651 (200103) 22 : 4 L . 373 : NMMD; 1-0 

Material Identity Number: J333-2001-003 

Language: English 

Subfile: C 

Copyright 2001, I EE 

...Abstract: to relatively small data sets. We recently demonstrated that 
nonlinear maps derived from a small random sample of a large data set 
exhibit the same structure and characteristics as that of the... 

...algorithm based on local learning. The method employs a fuzzy clustering 
methodology to partition the data space into a set of Voronoi 

polyhedra, and uses a separate neural network to perform the... 

2/3,K/2 (Item 2 from file: 2) 

DIALOG (R) File 2: INSPEC 

(c) 2004 Institution of Electrical Engineers. All rts. reserv. 

6829357 INSPEC Abstract Number: C2001-03-7 320-01 9 
Title: Nonlinear mapping networks 
Author (s): Agrafiotis, D.K.; Lobanov, V.S. 

Author Affiliation: 3-Dimensional Pharm. Inc., Exton, PA, USA 
Journal: Journal of Chemical Information and Computer Sciences vol.40, 
no. 6 p. 1356-62 
Publisher: ACS, 

Publication Date: Nov. -Dec. 2000 Country of Publication: USA 

CODEN : JCISD8 ISSN: 0095-2338 

SI CI: 0095-2 338 (200011/12) 40:6L. 1356: NMN; 1-9 

Material. Identity Number: J2 63-2000-006 

U.S. Copyright Clearance Center Code: 0095-2338 /2000/$1 9 . 00 
Language: English 
Subfile: C 
Copyright 2001, IEE 

...Abstract: unique for their conceptual simplicity and ability to 
reproduce the topology and structure of the data space in a faithful 
and unbiased manner. However, a major shortcoming of these methods is their 

. . . the principle of probability sampling, the method employs a classical 
algorithm to project a small random sample , and then "learns" the 
underlying nonlinear transform using a multilayer neural network trained 
wi th the . . . 



2/3, K/3 (Item 3 from file: 2) 

DIALOG (R) File 2: INSPEC 

(c) 2004 Institution of Electrical Engineers. All rts. reserv. 

5034359 INSPEC Abstract Number: B9510-6140C-4 69, C9510- 1250-239 
Title: Registration of 3-D images by genetic optimization 
Author(s): Jacq, J. -J.; Roux, C. 

Author Affiliation: Dept. Image et Traitment de l'Inf., Telecom Bretagne, 
Brest, France 

Journal: Pattern Recognition Letters vol.16, no . 8 p. 823-41 



Publication Date: Aug. 1995 Country of Publication: Netherlands 
CODEN: PRLEDG ISSN: 0167-8655 

U.S. Copyright Clearance Center Code: 01 67-8 655/ 95/$09 . 50 
Language: English 
Subfile: B C 
Copyright 1995, IEE 

...Abstract: and surface-to-volume registration. The CGA uses a 
stochastic fitness function which operates on randomly selected samples 
of the data space . At a higher level, an adaptive search space scaling 
technique is presented which operates by. . . 



2/3,K/4 (Item 1 from file: 94) 

DIALOG (R) File 94 : JICST-EPlus 

(c)2004 Japan Science and Tech Corp(JST). All rts. reserv. 

04727330 JICST ACCESSION NUMBER: 01A0216796 FILE SEGMENT: PreJICST-E 
Data Sampling for Evaluation of Structural Diversity of Chemical Compounds. 

TAKEZAWA HIROSHI (1); TAKAHASHI YOSHIMASA (1) 
(1) Toyohashi Univ. of Technol . 

Joho Kagaku Toronkai, Kozo Kassei Sokan Shinpojiumu Koen Yoshishu, 2000, 

VOL. 23rd-28th, PAGE. 208-211 
JOURNAL NUMBER: X0081AAK 

LANGUAGE: Japanese COUNTRY OF PUBLICATION: Japan 

DOCUMENT TYPE: Conference Proceeding 
MEDIA TYPE: Printed Publication 

...ABSTRACT: method for the evaluation of structural diversity of chemical 
compounds. Three different types of methods ( random sampling , cell 
partitioning method and clustering-based method) were investigated 
using a trial set of 5000... 

...by topological fragment spectra. For this case, data sampling was 
carried out on a reduced data space that is produced by 
mathematical mapping. The result also validated the usability of the 
cell. . . 



2/3, K/5 (Item 1 from file: 148) 

DIALOG (R) File 148:Gale Group Trade & Industry DB 
(c)2004 The Gale Group. All rts. reserv. 

16071212 SUPPLIER NUMBER: 101941076 { USE FORMAT 7 OR 9 FOR FULL TEXT 

) 

Current labor statistics. 

Monthly Labor Review, 126, 3, 31(66) 
March, 2003 

ISSN: 0098-1818 LANGUAGE: English RECORD TYPE: Fulltext 

WORD COUNT: 28018 LINE COUNT: 08590 



Injuries and Illnesses 
Description of the series 

The Survey of Occupational Injuries and Illnesses collects data 
;'■ rom employers about their workers* job-related nonfatal injuries and 
.illnesses. The information that employers... 



...Federal-State cooperative program with an independent sample selected 
for each participating State. A stratified random sample with a Neyman 
allocation is selected to represent all private industries in the State. 
The. . . 



2/3, K/6 (Item 1 from file: 180) 

DIALOG (R) File 180: Federal Register 

(c) 2004 format only The DIALOG Corp. All rts. reserv. 



DIALOG Accession Number: 02274118 



Supplier Number: 



930201997 



Privacy Act of 1974; Reissuance of DOD Systems of Records Notices 
Volume: 58 Issue: 33 Page: 10002 

CITATION NUMBER: 58 FR 10002 

DaCe: MONDAY, FEBRUARY 22, 1993 



2/3, K/7 (Item 1 from file: 340) 

DIALOG (R) File 34 0: CLAIMS (R) /US Patent 
(c) 2004 IFI/CLAIMS (R) . All rts. reserv. 

10260572 2003-0004973 

E /RANDOM SAMPLING AS A BUILT-IN FUNCTION FOR DATABASE ADMINISTRATION AND 
REPLICATION 

Inventors: Harper John William (US); Slishman Gordon Robert (US) 
Assignee: International Business Machines Corp 
Assignee Code: 42640 

Publication Application 
Kind Number Date Number Date 



Al US 20030004973 20030102 US 2001897803 20010702 
Priority Applic: US 2001897803 20010702 



Non-exemplary Claims: ...as set forth in claim 6, wherein the step of 
sampling said S records includes randomly sampling the S records 
utilizing dataspaces including: at least one index dataspace ; at 
least one key dataspace ; and, at least one statistics dataspace . 

...15. A database management system (DBMS) for managing an associated 

database, the DBMS comprising: random sampling facility integrated 
with the database management system; first database analysis tools using 
said integrated random sampling facility for generating extrapolated 
reports on database content; second database analysis tools using said 
integrated random sampling facility for generating extrapolated 
reports on database size; and, database replication tools adapted to 
execute 



2/3, K/8 (Item 2 from file: 340) 

DIALOG (R) File 34 0 : CLAIMS ( R) /US Patent 
(c) 2004 IFI/CLAIMS (R) . All rts. reserv. 

10260543 2003-0004944 

E/ PARTITION BOUNDARY DETERMINATION USING RANDOM SAMPLING ON VERY LARGE 
DATABASES 

inventors: Harper John William (US); Slishman Gordon Robert (US) 
Assignee: International Business Machines Corp 
Assignee Code: 42640 

Publication Application 
Kind Number Date Number Date 



Al US 20030004944 20030102 US 2001897853 20010702 
Priority Applic: US 2001897853 20010702 



Non-exemplary Claims: ...12. The method as set forth in claim 1, wherein 
the step of randomly sampling said S records includes randomly 
sampling the S records utilizing dataspaces including: at least one 
index dataspace ; at least one key dataspace ; and, at least one 
statistics dataspace . 

...program routine having a random number generating algorithm; a second 
computer program routine having a random sampling facility utilizing 
said first program routine to randomly read records from a database and 
store 



2/3, K/9 (Item 1 from file: 348) 

DIALOG (R) File 34 8 : EUROPEAN PATENTS 

(c) 2004 European Patent Office. All rts. reserv. 

01067933 

WEIGHTLESS BINARY N- TUPLE THRESHOLDING HIERARCHIES 

HIERARCHISCHE STRUKTUR ZUR S CHWE LLENVE RGLE I CHUNG UNGEWOGENER BINARDATEN 
HIERARCHIES DE DEFINITION DE SEUILS N-TUPLES, BINAIRES ET SANS POIDS 

PATENT ASSIGNEE: 

LiAE SYSTEMS pic, (427897), Warwick House, P.O. Box 87, Farnborough 
Aerospace Centre, Farnborough, Hampshire GU14 6YU, (GB) , (Proprietor 
designated states: all) 
1 NVEMTOR: 

KING, Douglas Beverley Stevenson, British A.M.A.andLA, Electr . Eng . , W354 B, 

W. Aerodrome, Warton,Nr Preston, Lanes. PR4 1AX, (GB) 
MACDIARMID, Ian Peter, British A.M. A. and Aero, Elect., W423, W. Aerodrome, 

Warton,Nr Preston, Lanes. PR4 1AX, (GB) 
MOORE, Colin, British A.M. A. and A, elec . Eng . , W354 B , W.Arodrome, 
Warton,Nr Preston, Lanes. PR4 1AX, (GB) 
LEGAL REPRESENTATIVE: 

Newell, William Joseph (53194), Wynne-Jones, Laine & James 22 Rodney Road 
, Cheltenham Gloucestershire GL50 1JJ, (GB) 
PATENT (CC, No, Kind, Date) : EP 1040408 Al 001004 (Basic) 

EP 1040408 Bl 020821 
WO 99032962 990701 
APPLICATION (CC, No, Date) : EP 98962564 981218; WO 98GB3837 981218 
PRIORITY (CC, No, Date): GB 9726752 971219; GB 9823382 981027 
DESIGNATED STATES: DE; ES ; FR; GB; IT; NL; SE 
INTERNATIONAL PATENT CLASS: G06F-007/02; G06F-015/80 
NOTE: 

No A-document published by EPO 
LANGUAGE ( Publicat ion , Procedural , Application } : English; English; English 
FULLTEXT AVAILABILITY: 



Available Text 


Language 


Update 


Word ' 




CLAIMS B 


(English) 


200234 


607 




CLAIMS B 


(German) 


200234 


557 




CLAIMS B 


( French) 


200234 


695 




SPEC B 


(English) 


200234 


2376 


Total 


word count 


- document 


A 


0 


Total 


word count 


- document 


B 


4235 


Total 


word count 


- documents A + B 


4235 



...SPECIFICATION is practically viable. 

It is important to randomly connect the pattern matcher outputs between 
the data space and the hierarchical structure because the pattern 
matchers can often "clump" results, e.g. 1111111011000000001... 
...of the first layer sum and threshold devices 14 of Figure 2 is seen to 
randomly sample the data space . 

There may be certain applications that do not want this random mapping 
- in general , if . . . 

2/3,K/10 {Item 1 from file: 349) 

DIALOG (R) File 34 9: PCT FULLTEXT 

(c) 2003 WIPO/Univentio . All rts. reserv. 

00837972 ++ Image available* * 

SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR REPRESENTING OBJECT 

RELATIONSHIPS IN A MULTIDIMENSIONAL SPACE 
SYSTEME, PROCEDE ET PROGICIEL POUR LA REPRESENTATION DE RELATIONS ENTRE 

OBJETS DANS UN ESPACE MULTIDIMENSIONAL 

Patent Applicant /Assignee : 

3- DIMENSIONAL PHARMACEUTICALS INC, Eagleview Corporate Center, Suite 104, 
665 Stockton Drive, Exton, PA 19341, US, US (Residence), US 
(Nationality), (For all designated states except: US) 
Patent Applicant /Inventor : 



AGRAFIOTIS Dimitris K, 660 Perimeter Drive, Downingtown, PA 19335, US, US 

(Residence), US (Nationality), {Designated only for: US) 
RASSOKHIN Dmitrii N , 101 Parker Court, Exton, PA 19341, US, US 

(Residence), RU (Nationality), (Designated only for: US) 
LOBANOV Victor S, 815 Azalea Drive, North Brunswick, NJ 08902, US, US 

(Residence), RU (Nationality), (Designated only for: US) 
SALEMME F Raymond, 1970 Timber Lakes Drive, Yardley, PA 19067, US, US 
(Residence), US (Nationality), (Designated only for: US) 
Legal Representative: 

LEE Michael Q (et al) (agent), Sterne, Kessler, Goldstein, & Fox 
P.L.L.C., Suite 600, 1100 New York Avenue, N.W., Washington, DC 
20005-3934, US, 
Patent and Priority Information (Country, Number, Date) : 

Patent: WO 200171624 Al 20010927 (WO 0171624) 

Application: WO 2001US8974 20010322 {PCT/WO US0108974) 

Priority Application: US 2000191108 20000322 
Designated States: AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU 
CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR 
KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE 
SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW 
(EP) AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR 
(OA) BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG 
(AP) GH GM KE LS MW MZ SD SL SZ TZ UG ZW 
(EA) AM AZ BY KG KZ MD RU TJ TM 
Publication Language: English 
Filing Language: English 
Fulltext Word Count: 10642 

Fulltext Availability: 
Detailed Description 

Detailed Description 

The approach employs an iterative algorithm based on subset 
refinements to nonlinearly map a small 

random sample which reflects the overall structure of the data, and 
then 

"learns" the underlying nonlinear transform... 

...networks, each specializing in a particular domain of the feature space. 
The 

partitioning of the data space can be carried out using a 



2/3,K/ll (Item 2 from file: 349) 

DIALOG { R) File 349: PCT FULLTEXT 

(c) 2003 WIPO/Univentio. All rts. reserv. 

00753772 +t Image available** 

METHOD , SYSTEM AND COMPUTER PROGRAM PRODUCT FOR NON-LINEAR MAPPING OF 

MULT I -DIMENSIONAL DATA 
PROCEDE, SYSTEME ET PROGRAMME INFORMATIQUE D * APPLICATION NON LINEAIRE DE 

DONNEES MULT ID IMENS I ONNELLE S 

Patent Applicant /Assignee : 

3 -DIMENSIONAL PHARMACEUTICALS INC, Eagleview Corporate Center, Suite 104, 
665 Stockton Drive, Exton, PA 19341, US, US (Residence), US 
(Nationality) 
Inventor (s) : 

AGRAFIOTIS Dimitris K, 660 Perimeter Drive, Downingtown, PA 19335, US 
LOBANOV Victor S, 24305 Cornerstone Drive, Yardley, PA 19067, US 
SALEMME Francis R, 1970 Timber Lakes, Yardley, PA 19067, US 

Legal Representative: 

LEE Michael Q, Sterne, Kessler, Goldstein & Fox P.L.L.C., Suite 600, 1100 
New York Avenue, m N.W., Washington, DC 20005-3934, US 

Patent and Priority 'information (Country, Number, Date) : 

Patent: WO 200067148 Al 20001109 (WO 0067148) 

Application: WO 2000US11838 20000503 (PCT/WO US0011838) 

Priority Application: US 99303671 19990503 

Designated States: AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE 



DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC 
LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK 
SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW 

(EP) AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE 

(OA) BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG 

(AP) GH GM KE LS MW SD SL SZ TZ UG ZW 

(EA) AM AZ BY KG KZ MD RU TJ TM 
Publication Language: English 
Filing Language: English 
Fulltext Word Count: 14837 

Full text Availability: 
Detailed Description 

Do La. lied Description 

. . . alone for their conceptual elegance and ability to reproduce the 
topology and structure of the data space in a faithful and unbiased 
manner. Unfortunately, all 

known algorithms exhibit quadratic time complexity which. . . 

. . . principle of 

probability sampling, the method employs an algorithm to 
mul ti-dimensionally 

scale a small random sample , and then "learns" the underlying 
non-linear 

1 5 transform using a multi-layer percept ron . 



2/3,K/12 (Item 3 from file: 349) 

DIALOG (R) File 34 9:PCT FULLTEXT 

(c) 2003 WIPO/Univentio. All rts. reserv. 



00501610 **Image available** 

WEIGHTLESS BINARY N-TUPLE THRESHOLDING HIERARCHIES 

HIERARCHIES DE DEFINITION DE SEUILS N-TUPLES, BINAIRES ET SANS POIDS 

Patent Applicant /Assignee : 

BRITISH AEROSPACE PUBLIC LIMITED COMPANY, 

KING Douglas Beverley Stevenson; , 

MAC Dl ARM I D Ian Peter;, 

MOORE Colin; , 
I riven tor { s ) : 

KiNG Douglas Beverley Stevenson;, 

MACDIARMID Ian Peter;, 
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Detailed Description 

Detailed Description 

... is practically viable. 

It is important to randomly connect the pattern matcher 
outputs between the data space and the hierarchical struc 
Lure because the pattern matchers can often "clump" results, 
e.g... of the first layer sum and threshold devices 14 
of Figure 2 is seen to randomly sample the data space . 



There may be certain applications that do not want this 
random mapping - in general, if . . . 
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Description of the Invention: 

...alone for their conceptual elegance and ability to reproduce the 
topology and structure of the data space in a faithful and unbiased 
manner. Unfortunately, all known algorithms exhibit quadratic time 
complexity which. . . 

...principle of probability sampling, the method employs an algorithm to 

mul t i-dimensionally scale a small random sample , and then "learns" 

the underlying non-linear transform using a multi-layer perceptron 
trained with . . . 
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Summary of the Invention: 

...0028] Raw partition analysis, without random sampling analysis, 
places a heavy strain on a computer system in terms of memory usage and 



typically requires multiple dataspaces . Random sampling relieves 
the strain on the computer system in terms of processing and memory 
requirements. Much less memory is required to analyze 20,000 sampled 
records using the random sampling approach than to analyze 
2,000,000,000 records without sampling. However, in order to... 

. . .with an unsampled approach which may be desirable under some 

circumstances, the preferred method using random sampling analysis 
utilizes one or more of each of the following types of dataspaces : 
index, key and statistics... 

Description of the Invention: 

...contain up to 8 gigabytes (GB) in keys, on a computer system having 
RAM 20 dataspaces of up to 2 [sup] 31 bytes (2 GB) , four dataspaces are 
required to store the keys. Another 2 GB are sufficient to store indices 
;r, : he keys. However, the record statistics, even when compressed, may 
r-Miuire dozens of dataspaces . To minimize the effort of storing and 
horning, irhe presenc invention randomly samples a database and 
produces an extrapolated partition analysis 24 providing sufficiently 
accurate results. Preferably, the sample size selected is sufficiently 
small so that three dataspaces will suffice, one each for indices, 
keys, and statistics... 

...0044] An analysis program 16, in communication with DBMS 14, partitions 
a random sample size of S records, and then scales the tabulated 
numbers by the ratio of the... 0059] The memory required by a partition 
analysis, even when random sampling is employed, can be large and, 
consequently, multiple dataspaces may be required. For databases 
organized with indexes and keys, sampling may require one or more 
dataspaces , e.g. one or more index dataspaces , one or more key 
dataspaces , and one or more statistics dataspaces . 
[• • • 

...0060] After the random sampling has been performed by sampling 
facility 26, and analysis program 18 has performed a partition 

Non-exemplary or Dependent Claim{s): 

...as set forth in claim 6, wherein the step of sampling said S records 
includes randomly sampling the S records utilizing dataspaces 
including: at least one index dataspace ; at least one key 
dataspace ; and, at least one statistics dataspace . 

...sample size and setting said number S equal to said particular number; a 
means for randomly sampling S records of the database using said 
random sampling facility; a means for storing statistics 
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Summary of the Invention: 

0027] Raw partition analysis, without random sampling analysis, 
piaces a heavy strain on a computer system in terms of memory usage and 
;. ypically requires multiple dataspaces . Random sampling relieves 
the strain on the computer system in terms of processing and memory 
requirements. Much less memory is required to analyze 20,000 sampled 
records using the random sampling approach than to analyze 
2,000,000,000 records without sampling. However, in order to... 

. . .with an unsampled approach which may be desirable under some 

circumstances, the preferred method using random sampling analysis 
utilizes one or more of each of the following types of dataspaces : 
index, key and statistics... 

Description of the Invention: 

...contain up to 8 gigabytes (GB) in keys, on a computer system having 
RAM 18 dataspaces of up to 2 [sub] 31 bytes (2 GB) , four dataspaces are 
required to store the keys. Another 2 GB are sufficient to store indices 
to the keys. However, the record statistics, even when compressed, may 
require dozens of dataspaces . To minimize the effort of storing and 
sorting, the present invention randomly samples a database and 
produces an extrapolated partition analysis 22 providing sufficiently 
accurate results. Preferably, the sample size selected is sufficiently 
small so that three dataspaces will suffice, one each for indices, 
keys, and statistics... 

...0044] An analysis program 16, in communication with the DBMS 14, 
partitions a random sample size of S records, and then scales the 
tabulated numbers by the ratio of the... 0080] It should be realized that 
the memory required by a partition analysis, even when random sampling 
is employed can be large and, consequently, multiple dataspaces may be 
required. For databases organized with indexes and keys, sampling may 
require one or more dataspaces , e.g. one or more index dataspaces , 
one or more key dataspaces , and one or more statistics dataspaces . 
[ • ■ - 

...0081] After random sampling has been performed by either sampling 
method, and analysis program 16 has performed necessary partition 



Non-exemplary or Dependent Claim(s): 

...12. The method as set forth in claim 1, wherein the step of randomly 
sampling said S records includes randomly sampling the S 
records utilizing dataspaces including: at least one index 
dataspace ; at least one key dataspace ; and, at least one 
statistics dataspace 
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Description of the Invention: 

. . .The approach employs an iterative algorithm based on subset 
refinements to nonlinearly map a small random sample which reflects 
the overall structure of the data, and then "learns" the underlying 
nonlinear transform. . . 

...networks, each specializing in a particular domain of the feature space. 
The partitioning of the data space can be carried out using a 
clustering methodology. This local approach eliminates a significant 
portion . . . 
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...alone for their conceptual elegance and ability to reproduce the 
topology and structure of the data space in a faithful and unbiased 
manner. Unfortunately, all known algorithms exhibit quadratic time 
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...principle of probability sampling, the method employs an algorithm to 
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Description of the Invention: 

...of the first layer sum and threshold devices 14 of FIG. 2 is 
randomly sample the data space . 



